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1.  Research  Background  and  Motivation 

Data  assimilation  addresses  the  dual  goals  of  (1)  producing  the  best  estimate  of  the  state  of  the 
physical  system  (e.g.,  atmosphere,  ocean,  land)  and  (2)  quantifying  the  uncertainty  on  that 
estimate.  All  modern  data  assimilation  systems  are  based  on  the  idea  that  the  most  complete 
information  about  the  state  of  a  system  can  be  expressed  as  a  conjunction  of  information 
contained  in  a  prior  estimate,  a  set  of  observations,  and  a  model  of  the  physical  system. 
Probability  distribution  functions  (PDFs)  are  used  to  define  each  type  of  information,  and  the 
solution  is  obtained  from  the  joint  probability  distribution  of  the  control  variable(s)  and  the 
available  information.  Over  the  past  50  years,  research  in  statistical  estimation  techniques  and 
their  application  to  large  systems  has  resulted  in  estimates  of  the  atmosphere,  ocean,  and  land 
state  that  are  robust  for  most  cases.  However,  significant  challenges  remain  to  be  addressed.  Key 
among  these  are  how  to  properly  account  for  model  error  and  how  to  produce  estimates  for 
highly  nonlinear  systems,  especially  for  high-impact  weather  events  (e.g..  severe  storms). 

Ensemble  data  assimilation  algorithms  have  increasingly  been  used  in  nonlinear  systems 
because  they  do  not  require  use  of  a  linear  approximation  to  the  forecast  model  (e.g.,  tangent 
linear  model  or  adjoint).  Ensemble  assimilation  algorithms  produce  a  solution  by  generating  a 
sample  of  the  joint  PDF  of  interest,  but  are  subject  to  potentially  limiting  assumptions  about 
these  probabilities  (e.g.,  Gaussian).  By  contrast.  Markov  chain  Monte  Carlo  (MCMC)  algorithms 
require  no  specific  form  of  the  probability  distributions  of  interest,  and  produce  a  sample  of  the 
true  solution  probability.  MCMC  has  been  used  to  effectively  characterize  uncertainty  and 
information  content  in  remote  sensing  retrievals  (Posselt  et  al.,  2008;  Posselt  and  Mace,  2014), 
and  to  assess  the  uncertainty  in  model  physics  parameterizations  (Posselt  and  Vukicevic,  2010). 
MCMC  algorithms  are  well-suited  to  non-Gaussian  estimation  in  the  presence  of  nonlinearities, 
but  are  relatively  computationally  expensive  and  are  only  practical  for  relatively  simple  models 
and  low-dimensional  systems.  The  research  conducted  in  the  course  of  this  project  was  designed 
to  use  MCMC  to  evaluate  two  modern  data  assimilation  systems  developed  at  the  Naval 
Research  Laboratory  in  Monterey,  CA.  In  the  process,  it  seeks  to  move  the  data  assimilation 
community  in  the  direction  of  an  operational  particle  filter-based  data  assimilation  system. 

The  following  tasks  were  conducted  in  the  course  of  the  three-year  project: 

1 .  Examine  the  strengths  and  limitations  of  Ensemble  Kalman  Filter  (EnKF)  type  data 
assimilation  algorithms  when  applied  to  estimation  of  convective  cloud  system 
properties. 

2.  Determine  whether  EnKF  algorithms  are  capable  of  representing  rapid  changes  in  the 
state  associated  with  transitions  between  convective  and  stratiform  precipitation. 

3.  Explore  the  degree  to  which  EnKF  techniques  can  properly  return  estimates  of  positive 
definite  quantities  (e.g..  cloud  content). 

4.  Assess  whether  a  recent  innovation  on  traditional  EnKF  algorithms  (the  quadratic  filter; 
Hodyss,  201 1)  is  capable  of  improving  the  representation  of  positive  definite  quantities. 

Each  of  these  three  tasks  draws  on  the  Pi's  expertise  with  nonlinear  data  assimilation  methods, 
and  leverages  the  resources  and  expertise  available  at  the  Naval  Research  Laboratory  in 
Monterey,  CA.  In  the  following  sections,  the  results  obtained  in  each  of  the  above  task  areas  is 
briefly  described,  along  with  reference  to  relevant  publications. 
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2.  Estimation  of  Convective  Cloud  System  Properties  Using  an  EnKF  Algorithm 

This  work  was  designed  to  assess  the  effectiveness  of  an  Ensemble  Transform  Kalman  Filter 
(ETKF)  to  represent  convective  processes.  Previous  research  found  that  the  probability  density 
functions  (PDFs)  of  cloud  microphysical  parameters  were  non-Gaussian.  and  in  many  cases,  had 
a  non-unique  solution  (multiple  PDF  modes).  In  this  work,  we  built  upon  the  work  of  Posselt  and 
Vukicevic  (2010),  and  used  MCMC  to  examine  the  degree  to  which  an  ETKF  algorithm  was  able 
to  characterize  non-Gaussian  PDFs.  We  utilized  a  column  convective  model  and  built  an  ETKF 
algorithm  suitable  for  performing  model  parameter  estimation. 

The  major  findings  of  this  research  consisted  of  the  following: 

1 .  The  effect  of  model  nonlinearity  on  the  posterior  PDF  is  included  in  ETKS  in  that  the 
nonlinear  model  is  allowed  to  respond  to  changes  in  model  parameters;  the  full  nonlinear  model 
propagates  perturbations  forward  in  time.  However,  these  changes  are  constrained  by  the 
requirement  that  the  ETKS  posterior  perturbations  be  strictly  linear  functions  of  the  prior 
perturbations.  In  contrast,  the  accept-reject  procedure  of  the  MCMC  finds  posterior  perturbations 
that  can  be  any  function  of  the  prior  perturbations. 

2.  Ensemble  Kalman  Smoothers  can  preserve  key  aspects  of  Non-Gaussian  priors.  Specifically, 
the  ETKF  was  found  to  give  a  qualitatively  accurate  multi-modal  posterior  PDF  when  given  an 
accurate  multi-modal  prior.  The  implication  is  that  it  is  not  the  Gaussian  assumption  used  in  the 
derivation  of  the  ETKS  that  causes  mis-representation  of  the  posterior.  Instead,  it  is  the  lack  of 
information  on  higher  moments  and/or  multiple  modes  in  the  prior  ensemble. 

3.  Response  of  ETKF  Estimates  to  Changes  in  Convective  Regime 

In  this  portion  of  the  research,  we  used  the  same  column  model  framework  described  in  section  2 
to  explore  the  extent  to  which  EnKF  type  algorithms  are  capable  of  tracking  changes  in  cloud 
regime  with  time.  Specifically,  the  model  is  designed  to  simulate  a  transition  between  deep 
convection  and  stratiform  rainfall  half  way  through  its  three-hour  integration.  The  PDFs  of  cloud 
microphysical  variables  change  significantly  at  this  transition  point  due  to  the  influence  of 
different  parameters  at  different  stages  of  the  convective  life  cycle.  MCMC  naturally  tracks  the 
effect  of  these  changes  on  the  model  output,  but  it  was  unclear  whether  an  EnKF  algorithm  is 
capable  of  doing  the  same.  We  generated  posterior  probability  distributions  using  sequentially 
greater  numbers  of  observations  in  time,  and  evaluated  the  efficacy  of  the  EnKF  as  compared 
with  MCMC. 

The  major  conclusions  of  this  study  were  the  following: 

I .  Ensemble  Kalman  smoothers  perforin  poorly  when  the  posterior  mean  and  perturbations  are 
strongly  non-linear  functions  of  the  forecast  error.  This  was  evidenced  by  a  failure  of  the  ETKF 
to  represent  the  transition  of  the  posterior  PDF  from  a  uni-modal  to  multi-modal  form  when 
observations  of  the  stratiform  phase  of  cloud  evolution  were  assimilated.  Though  the  ETKS  is 
unable  to  produce  a  multimode  analysis  from  a  uni-mode  prior,  the  posterior  PDF  produced  by 
both  the  deterministic  and  perturbed  observations  versions  of  the  ETKS  is  clearly  non-Gaussian. 
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2.  The  uncertainty  characteristics  of  model  physics  parameterizations  depend  critically  on  the 
characteristics  of  the  environment.  Abrupt  changes  in  the  physical  environment  can  lead  to 
similarly  abrupt  changes  in  parameter  uncertainty.  Ensemble  Kalman  Filter-type  algoritms,  while 
not  capable  of  capturing  rapid  (nonlinear)  transitions  in  the  nature  of  the  posterior  PDF,  can  be 
shown  to  perform  well  if  provided  with  a  robust  prior  ensemble.  As  such,  Ensemble-Kalman- 
Filter-type  algorithms  have  promise  as  cost-efficient  methods  for  model  parameter  estimation. 

4.  Determination  of  whether  ETKF  Algorithms  can  Represent  Positive  Definite  Quantities 

This  research  addressed  the  question  of  whether  ensemble  filters,  which  employ  the  full 
nonlinear  model,  are  capable  of  representing  quantities  that  are  hard  bounded.  In  this  case,  the 
quantities  of  interest  are  cloud  microphysical  parameters  that  are  hard  bounded  at  a  value  of  zero. 
As  in  the  research  described  in  sections  (2)  and  (3)  above,  this  work  employed  a  column 
convection  model  with  control  parameters  that  were  tunable  constants  in  the  cloud  microphysical 
scheme.  In  contrast  to  previous  work,  which  examined  only  one  set  of  microphysical  values,  this 
work  tested  several  sets  of  values.  The  goals  were  to  determine  ( I )  how  the  posterior  PDF 
changes  with  true  state,  and  (2)  whether  the  EnKF  estimate  degrades  with  proximity  to  a  hard 
bound. 

The  major  conclusions  of  this  work  were  the  following: 

1 .  The  true  analysis  ensemble,  as  constructed  from  samples  of  the  Bayesian  posterior 
distribution,  changes  shape  significantly  with  changes  in  the  true  parameter  set  for  a  model  in 
which  control  parameters  are  nonlinearly  related  to  the  observations. 

2.  Multimodality  is  realized  only  in  certain  regions  of  the  parameter  space,  and  is  associated  with 
non-monotonicity  in  the  parameter-observation  response  function. 

3.  EnKF  algorithms  produce  PDFs  with  increasing  probability  mass  at  non-physical  values  as 
parameter  values  approach  zero.  In  fact,  for  parameters  very  close  to  zero,  the  posterior  mean 
may  be  non-physical. 

4.  The  slope  of  the  parameter-observation  response  function  determines  parameter  sensitivity, 
and.  by  extension,  the  posterior  variance.  A  constant  response  function  derivative  leads  to 
posterior  variance  that  is  independent  of  the  true  parameter  value.  This  is  consistent  with  results 
found  by  Hodyss  (201 1 ),  who  showed  that  the  first  derivative  of  the  response  function  with 
respect  to  observations  determines  the  posterior  variance  while  the  second  derivative  determines 
the  posterior  third  moment. 

5.  Evaluation  of  a  Quadratic  Ensemble  Filter 

In  this  final  portion  of  the  work,  we  explored  the  extent  to  which  an  ensemble  filter  that  accounts 
for  skewness  in  the  PDFs  of  interest  can  improve  the  solution  PDF  for  nonlinear  and  non- 
Gaussian  quantities.  We  implemented  a  version  of  the  Quadratic  Ensemble  Filter  (QEF), 
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developed  at  NRL-Monterey,  and  tested  it  for  the  same  convective  system  used  in  topics  (2)  - 
(4)  above.  The  major  conclusions  of  this  work  were  the  following: 

1 .  The  true  error  distribution  for  a  given  set  of  observations  (the  “error  of  the  day")  is  not 
reproduced  by  ensemble  smoothers.  Instead,  the  distribution  produced  by  ensemble  filters 
(ETKF  and  QEF)  consists  of  the  expected  analysis  error  covariance  matrix  produced  by 
integrating  over  all  possible  observations  for  a  given  prior. 

2.  When  approximate  ensemble  solutions  are  compared  with  the  integral  over  multiple  Bayesian 
posteriors  constructed  from  multiple  draws  of  the  true  parameters  from  the  prior  (e.g.,  by  running 
MCMC  multiple  times  using  different  true  parameter  sets),  both  ETKF  and  QEF  can  be  shown  to 
provide  a  realistic  estimate  of  the  average  posterior  analysis  distribution,  but  with  larger 
ensemble  variance  than  that  of  the  average  of  the  true  posterior  analysis  distributions. 

3.  A  filter  constructed  with  a  nonlinear  update  that  accounts  for  the  effects  of  skewness  in  the 
prior  and  posterior  distribution  produces  on  average  an  estimate  that  is  more  consistent  with  the 
true  posterior  ensemble  mean,  but  which  still  fails  for  cases  with  non-monotonic  nonlinearity. 

For  these  cases,  the  mean  is  closer  to  the  true  mean  than  an  EnKS  algorithm,  but,  like  the  EnKS. 
is  also  not  restricted  to  regions  of  phase  space  where  known  physically  consistent  solutions  exist. 
In  addition,  for  state  estimates  hard  bounded  at  some  value  (e.g.,  zero  for  concentrations  of  scalar 
quantities),  a  significant  portion  of  the  posterior  ensemble  density  may  lie  in  an  unphysical 
region  of  the  parameter  space.  This  becomes  more  marked  when  the  observed  concentrations 
and/or  parameter  values  approach  the  specified  limit. 

6.  Summary 

This  project  led  to  significant  advances  in  the  understanding  of  ensemble  data  assimilation 
theory,  and  has  subsequently  supported  the  development  of  new  assimilation  schemes  more 
suitable  for  nonlinear  systems.  MCMC  was  shown  to  be  a  robust  tool  for  the  evaluation  of 
ensemble  data  assimilation  schemes,  and  exploration  of  how  MCMC  may  be  used  to  evaluate 
new  state  of  the  art  assimilation  systems  is  now  underway. 
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